CoGenT++: an extensive and extensible data environment for computational genomics

نویسندگان

  • Leon Goldovsky
  • Paul Janssen
  • Dag G. Ahrén
  • Benjamin Audit
  • Ildefonso Cases
  • Nikos Darzentas
  • Anton J. Enright
  • Núria López-Bigas
  • José M. Peregrín-Alvarez
  • Mike Smith
  • Sophia Tsoka
  • Victor Kunin
  • Christos A. Ouzounis
چکیده

MOTIVATION CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. DESCRIPTION CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions--AllFuse, putative orthologs--OFAM, protein families--TRIBES, phylogenetic profiles--ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. CONCLUSION CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COmplete GENome Tracking (COGENT): A Flexible Data Environment for Computational Genomics

SUMMARY We present a database of fully sequenced and published genomes to facilitate the re-distribution of data and ensure reproducibility of results in the field of computational genomics. For its design we have implemented an extremely simple yet powerful schema to allow linking of genome sequence data to other resources. AVAILABILITY http://maine.ebi.ac.uk:8000/services/cogent/

متن کامل

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

CGAT: computational genomics analysis toolkit

Computational genomics seeks to draw biological inferences from genomic datasets, often by integrating and contextualizing next-generation sequencing data. CGAT provides an extensive suite of tools designed to assist in the analysis of genome scale data from a range of standard file formats. The toolkit enables filtering, comparison, conversion, summarization and annotation of genomic intervals...

متن کامل

SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications

Breakthroughs in gene sequencing technologies have led to an exponential increase in the amount of genomic data. Efficient tools to rapidly process such large quantities of data are critical in the study of gene functions, diseases, evolution, and population variation. These tools are designed in an ad-hoc manner, and require extensive programmer effort to develop and optimize them. Often, such...

متن کامل

Computational fluid dynamics simulations for investigation of parameters affecting goaf gas distribution

It is necessary to obtain a fundamental understanding of the goaf gas flow patterns in longwall mine in order to develop optimum goaf gas drainage and spontaneous combustion (sponcom) management strategies. The best ventilation layout for a longwall underground mine should assist in goaf gas drainage and further reduce the risk of sponcom in the goaf. Further, in the longwall panel, regulators ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 21 19  شماره 

صفحات  -

تاریخ انتشار 2005